Word intuition agreement among Chinese speakers: a Mechanical Turk-based study
نویسندگان
چکیده
Word intuition is speakers’ intuitive knowledge on wordhood. Collective word intuition is the word intuition of the whole language community. Given this definition, the optimal word segmentation result in Chinese NLP should reflect collective word intuition. It is also believed that an ideal definition of Chinese word should accord with the collective word intuition of Chinese speakers. To test the validity and feasibility of modeling collective word intuition, it is important to know to what extent Chinese speakers agree with each other on what is a word. In this study, we measured word intuition agreement using Mechanical Turk-based Chinese word segmentation experiment. Three metrics were used: proportionate agreement, Cohen’s kappa, and Fleiss’ kappa. The results show that Chinese speakers agree with each other almost perfectly on what is a word. And we found no evidence to support an effect of semantic transparency on word intuition agreement. Such high word intuition agreement among Chinese speakers supports the psychological reality of Chinese word and also suggests that that it is quite feasible to formulate a definition of Chinese word by modeling the collective word intuition of Chinese speakers.
منابع مشابه
Sifu: Interactive Crowd-Assisted Language Learning
This paper introduces SIFU, a system that recruits in real time native speakers as online volunteer tutors to help answer questions from Chinese language learners in reading news articles. SIFU integrates the strengths of two effective online language learning methods: reading online news and communicating with online native speakers. SIFU recruits volunteers from an online social network rathe...
متن کاملBilingualism, Biliteracy and Metalinguistic Awareness: Word Awareness in English and Japanese Users of Chinese as a Second Language
Cross-linguistic research shows that some aspects of metalinguistic awareness are affected by characteristics of different writing systems. Users of writing systems that mark word boundaries (such as English) develop word awareness, while users of unspaced writing systems (such as Chinese) do not. Previous research showed that English-speaking users of Chinese as a Second Language (CSL) have hi...
متن کاملClustering dictionary definitions using Amazon Mechanical Turk
Vocabulary tutors need word sense disambiguation (WSD) in order to provide exercises and assessments that match the sense of words being taught. Using expert annotators to build a WSD training set for all the words supported would be too expensive. Crowdsourcing that task seems to be a good solution. However, a first required step is to define what the possible sense labels to assign to word oc...
متن کاملExploring Mental Lexicon in an Efficient and Economic Way: Crowdsourcing Method for Linguistic Experiments
Mental lexicon plays a central role in human language competence and inspires the creation of new lexical resources. The traditional linguistic experiment methodwhich is used to exploremental lexicon has some disadvantages. Crowdsourcing has become a promising method to conduct linguistic experiments which enables us to explore mental lexicon in an efficient and economic way. We focus on the fe...
متن کاملInfluence of suprasegmental features on perceived ethnicity of American politicians
How accurate are listeners at identifying the ethnicities of political figures from one-word samples? Do suprasegmental variables provide a basis for these judgments? Tokens of six lexical items were extracted from speeches by seven male political figures of different stated ethnic identities. In a Mechanical Turk experiment, 94 listeners heard each token twice, then responded to the multiplech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017